Improving the Scalability of Reduct Determination in Rough Sets
نویسنده
چکیده
Rough Set Data Analysis (RSDA) is a non-invasive data analysis approach that solely relies on the data to find patterns and decision rules. Despite its noninvasive approach and ability to generate human readable rules, classical RSDA has not been successfully used in commercial data mining and rule generating engines. The reason is its scalability. Classical RSDA slows down a great deal with the larger datt;l sets and takes much longer times to generate the rules. This research is aimed to address the issue of scalability in rough sets by improving the performance of the attribute reduction step of the classical RSDA which is the root cause of its slow performance. We propose to move the entire attribute reduction process into the database. We defined a new schema to store the initial data set. We then defined SOL queries on this new schema to find the attribute reducts correctly and faster than the traditional RSDA approach. We tested our technique on two typical data sets and compared our results with the traditional RSDA approach for attribute reduction. In the end we also highlighted some of the issues with our proposed approach which could lead to future research.
منابع مشابه
Feature ranking in rough sets
We propose a novel feature ranking technique using discernibility matrix. Discernibility matrix is used in rough set theory for reduct computation. By making use of attribute frequency information in discernibility matrix, we develop a fast feature ranking mechanism. Based on the mechanism, two heuristic reduct computation algorithms are proposed. One is for optimal reduct and the other for app...
متن کاملMushroom Plant Analysis through Reduct Technique
The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. Lot of knowledge related to the application can be generated through these large data sets. Rough set is the methodology which can be used to deduce rules from these data sets...
متن کاملMushroom Plant Analysis through Reduct Technique
The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. Lot of knowledge related to the application can be generated through these large data sets. Rough set is the methodology which can be used to deduce rules from these data sets...
متن کاملFuzzy rough set based incremental attribute reduction from dynamic data with sample arriving
Attribute reduction with fuzzy rough set is an effective technique for selecting most informative attributes from a given realvalued dataset. However, existing algorithms for attribute reduction with fuzzy rough set have to re-compute a reduct from dynamic data with sample arriving where one sample or multiple samples arrive successively. This is clearly uneconomical from a computational point ...
متن کاملA New Rough Sets Model Based on Database Systems
Rough sets theory was proposed by Pawlak in the 1980s and has been applied successfully in a lot of domains. One of the major limitations of the traditional rough sets model in the real applications is the inefficiency in the computation of core and reduct, because all the intensive computational operations are performed in flat files. In order to improve the efficiency of computing core attrib...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011